\documentclass[a4paper,%
11pt,%
DIV14,
headsepline,%
headings=normal,
]{scrartcl}
%\usepackage{ngerman}
\usepackage[utf8]{inputenc}
\usepackage[T1]{fontenc}
\usepackage{textcomp}




% stuff for getting the bold monospace font to work
\usepackage{bold-extra}
\usepackage[T1]{fontenc}
%\usepackage[scaled=0.85]{beramono}



% for matlab code
% bw = blackwhite - optimized for print, otherwise source is colored
%\usepackage[framed,numbered,bw]{mcode}

% for other code
%\usepackage{listings}

%\usepackage[ansinew]{inputenc}

\usepackage{color}
\usepackage{hyperref}
\usepackage{graphicx}
\usepackage{listings}
\usepackage[hang, font = footnotesize]{caption}
%\usepackage{german}
\usepackage{algorithm}
\usepackage{algpseudocode}
\usepackage{enumitem}
\usepackage{amssymb}
\usepackage{amsmath}
\usepackage{mathrsfs}
\usepackage{mathalpha}
\usepackage{wrapfig}

\renewcommand{\thesubsection}{Ex~\arabic{section}.\arabic{subsection}}
\newcommand{\solution}{\textbf{Solution} }

\lstset{basicstyle=\footnotesize\ttfamily,breaklines=true}

\lstset{morekeywords={atomic, int, bool, return, if, for, while, do, return, switch, case, class}, morecomment=[l]//, morecomment=[s]{/*}{*/}}

\begin{document}





\hrule height 1px
\vspace*{1ex}
\begin{minipage}[t]{.45\linewidth}
\strut\vspace*{-\baselineskip}\newline
\includegraphics[height=.9cm]{res/Inf-Logo_black_en.pdf}
\includegraphics[height=.9cm]{res/par-logo.pdf}
\end{minipage}
\hfill
\begin{minipage}[t]{.5\linewidth}
\flushright{
Research Group for Parallel Computing\\%
Faculty of Informatics\\%
TU Wien}
\end{minipage}
\vspace*{1ex}

\hrule 

\vspace*{2ex}

\begin{center}
{\LARGE\textbf{Advanced Multiprocessor Programming}}\\
{\large{}%
	Summer term 2023 \\
	Theory exercise 2 of 2 \\
\vspace*{2ex}
Norbert Tremurici (11907086)

%\semestershort\\
%\assignmentname{0}}
}
\end{center}
\hfill
\begin{minipage}[b]{1\linewidth}
\hfill
\begin{tabular}{@{}ll@{}}
Issue date: & 2023-04-23\\
Due date:   & \color{red}{2023-05-08 (23:59)}
\end{tabular}
\end{minipage}

\vspace*{3ex}
\hrule 
\vspace*{1ex}

\begin{flushright}
	\textbf{\underline{Total points: 42}}
\end{flushright}

%===================================
\section{Consistency Criteria} \label{sec:consistency}
%\subsection{(1 point)} \label{ex:21}
%Explain why quiescent consistency is compositional.

\subsection{(2 points)} \label{ex:37_new}
Give an example of a sequentially-consistent execution that is not safe.

\solution Sequential consistency requires that a history $H$ preserves program order for all threads, that is to say, $H$ consists of a valid interleaving and for a thread $A$, its subhistory $H|A$ is in program order.

While safe objects can return any value of its domain on a read during a concurrent write access, it must return the most recently written value during a non-concurrent read.

Thus we can construct an example:

$$
H = write_B(x = false) \to write_A(x = true) \to read_B(x == false) \to read_A(x == true)
$$

Note that $H|B = write_B(x = false) \to read_B(x == false)$ and $H|A = write_A(x = true) \to read_A(x == true)$ are sequentially consistent but because $B$ reads $false$ even though $true$ has been written strictly before, the object is not safe.

\subsection{(2 points)} \label{ex:22}
Consider a \emph{memory object} that encompasses two register components.
We know that if both registers are quiescently consistent, then so is the memory.
Does the converse hold?
If the memory is quiescently consistent, are the individual registers quiescently consistent?
Outline a proof, or give a counterexample.

\solution It does not automatically hold that the individual registers are quiescently consistent.

Consider a history $H$ where events $Y_1 = write(y = true)$, $X = read(x == false)$ and $Y_2 = read(y == false)$ happen concurrently, but $Y_1 \to Y_2$.

In this history, there is no period of quiescence for the memory object, because beginning from the start of the first event to the end of the last event a method call is pending.

If we unroll the history into the objects' subhistories, we see that they are not quiescently consistent:
$H|x = read(x == false)$ and $H|y = write(y = true) \to read(y == false)$.
For subhistory $H|y$ there is a period of quiescence, so the register is not quiescently consistent, even though the memory object is.

%\subsection{(4 points)} \label{ex:23}
%Give an example of an execution that is quiescently consistent but not sequentially consistent, and another one that is sequentially consistent, but not quiescently consistent.
%For each of your examples provide a short proof sketch and an example execution to both show the impossibility regarding one consistency criterion and the compliance regarding the other one.

\subsection{(4 points)} \label{ex:original_1}
Suppose $x,y,z$ are registers with initial values $0$.
Is the history in Figure \ref{fig:cons_crit_hist1} quiescently consistent, sequentially consistent, linearizable?
For each consistency criterion, either provide a consistent execution or a proof sketch.

Does there exist an initial valuation for the registers $x,y,z$ such that your above result w.r.t. linearizability changes? (Meaning if you found the history to be linearizable, can you find initial values for $x,y,z$ s.t. it no longer is linearizable; conversely if you found the history to be not linearizable, can you find initial values s.t. it becomes linearizable?)
If so, provide the initial values and either a proof sketch or an example execution.

\begin{figure}[H]
	\centering \includegraphics[width=16cm]{res/Consistency Criteria Ex.1 monochrome.png}
	\caption{History with five threads and three registers} \label{fig:cons_crit_hist1}
\end{figure}

\solution The history is not linearizable and not quiescently consistent, but sequentially consistent.

We can easily see that it cannot be quiescently consistent if we consider the single point of quiescence in the history.
The first block ends with event $z.read(4)$ and the second block begins with $x.write(1)$, with a period of quiescence in between.
As the history becomes quiescent, the execution so far has to be equivalent to some sequential execution.
We can see that event $z.read(4)$ is impossible, as $z$ starts with initial value $0$.
We would need to re-order the event $z.write(4)$, which is not allowed due to the period of quiescence.

It follows that the history cannot be linearizable, but we can also easily see that the first reads are impossible, because registers $x, y, z$ have initial values $0$.
However, if we allow initial values $x = 1$, $y = 3$ and $z = 4$, we can find a linearization of the history.

Figure \ref{fig:ex2-1-4-sol1} shows the unrolled sequential history, where events have been re-ordered and with program order remaining undisturbed, proving sequential consistency.
Figure \ref{fig:ex2-1-4-sol2} shows a valid linearization if the initial values are changed.

\begin{figure}[H]
	\centering \includegraphics[width=16cm]{res/ex2-1-4-sequentialization.png}
	\caption{Unrolled sequential history} \label{fig:ex2-1-4-sol1}
\end{figure}

\begin{figure}[H]
	\centering \includegraphics[width=16cm]{res/ex2-1-4-linearization.png}
	\caption{Linearized history} \label{fig:ex2-1-4-sol2}
\end{figure}

\subsection{(4 points)} \label{ex:original_2}
Suppose $x,y,z$ are stacks and $w$ is a FIFO queue.
Initially the stacks and the queue are all empty.
Is the history in Figure \ref{fig:cons_crit_hist2} quiescently consistent, sequentially consistent, linearizable?
For each consistency criterion, either provide a consistent execution or a proof sketch.

\begin{figure}[H]
	\centering \includegraphics[width=16cm]{res/Consistency Criteria Ex.2 monochrome.png}
	\caption{History with seven threads three stacks and one queue} \label{fig:cons_crit_hist2}
\end{figure}

\solution The history is not sequentially consistent, quiescently consistent or linearizable.

For a sequentially consistent execution, we cannot change the thread-local order of events.
If we consider threads $B$ and $E$ operating on object $w$, we can see that $H|B|w = w.deq(3) \to w.deq(4)$, but $H|E|w = w.enq(4) \to w.enq(3)$.
Values $3$ and $4$ are enqueued and dequeued only by threads $B$ and $E$.

Because $w$ is a FIFO queue, what is enqueued first must be dequeued first, we can see that no matter what, $4$ is enqueued before $3$.
But we can also see that no matter what, $3$ is dequeued before $4$.
This contradicts the assumption that $w$ is a FIFO queue and thus the history cannot even be sequentially consistent.

There is no period of quiescence, so for this example sequential consistency and quiescent consistency are identical and the history is not quiescently consistent.

By reasoning similar to why the history cannot be sequentially consistent, the history cannot be linearizable.

\subsection{(2 points)} \label{ex:27}
The \texttt{AtomicInteger} class (in \textbf{java.util.concurrent.atomic} package) is a container for an integer value.
One of its methods is

\texttt{\textbf{boolean} compareAndSet(\textbf{int} expect, \textbf{int} update)}.

This method compares the object's current value to \texttt{expect}.
If the values are equal, then it atomically replaces the object's value with \texttt{update} and returns \texttt{true}.
Otherwise it leaves the object's value unchanged and returns \texttt{false}.
This class also provides

\texttt{\textbf{int} get()}

which returns the object's actual value.

Consider the FIFO queue implementation shown in Figure \ref{fig:non_lin_queue}.
It stores its items in an array \texttt{items}, which for simplicity we will assume has unbounded size.
It has two \texttt{AtomicInteger} fields: \texttt{head} is the index of the next slot from which to remove an item and tail is the index of the next slot in which to place an item.
Give an example execution showing that this implementation is \emph{not} linearizable and provide a short explanation as to why.

\begin{figure}[H]
	\centering \includegraphics[width=10cm]{res/Non-linearizable queue implementation.png}
	\caption{Non-linearizable queue implementation} \label{fig:non_lin_queue}
\end{figure}

\solution We can construct an example if we look at what precisely causes a problem in this implementation.

After \texttt{tail} is incremented atomically, a thread can nap or take a vacation until it decides to execute line 10, setting the enqueued value to the slot the thread has reserved for itself.

Consider history $H$ where $enq(2) \to deq(\text{EmptyException})$, with event $enq(1)$ happening concurrently to $enq(2)$ and $deq(\text{EmptyException})$.
This example is illustrated in Figure \ref{fig:ex2-1-5}.

\begin{figure}[H]
	\centering \includegraphics[width=10cm]{res/ex2-1-5-counterexample.png}
	\caption{Non-linearizable queue execution} \label{fig:ex2-1-5}
\end{figure}

The execution is valid, if we consider that $enq(1)$ reserves slot $i$ and $enq(2)$ slot $i + 1$.
If there were no previous elements, \texttt{head} is going to have value $i$ and the dequeue operation attempts to read \texttt{items[slot]} at slot $i$.
If $enq(1)$ decides to nap or take a vacation, \texttt{items[slot]} at slot $i$ is still going to be null, so the dequeue operation throws an \texttt{EmptyException}.

There are three candidates for a possible linearization as can be seen in the figure, but the first two where $enq(1)$ linearizes before $deq(\text{EmptyException})$ cannot be a valid linearization for a FIFO queue.
The last candidate, where $enq(1)$ linearizes after, also fails, but for a different reason.
In this case, $deq(\text{EmptyException})$ cannot occur because it strictly follows $enq(2)$.
Thus this queue is not linearizable.

\subsection{(4 points)} \label{ex:32}
This exercise examines a queue implementation that can be seen in Figure \ref{fig:lin_queue}, whose \texttt{enq()} method does not have a linearization point.

\begin{figure}[H]
	\centering \includegraphics[width=14cm]{res/Linearizable queue implementation.png}
	\caption{Queue implementation} \label{fig:lin_queue}
\end{figure}

The queue stores its items in an \texttt{items} array, which for simplicity we will assume is unbounded.
The \texttt{tail} field is an \texttt{AtomicInteger}, initially zero.
The \texttt{enq()} method reserves a slot by incrementing \texttt{tail} and then stores the item at that location.
Note that these two steps are not atomic: there is an interval after \texttt{tail} has been incremented but before the item has been stored in the array.

The \texttt{deq()} method reads the value of \texttt{tail}, and then traverses the array in ascending order from slot zero to the tail.
For each slot, it swaps \emph{null} with the current contents, returning the first non-\emph{null} item it finds.
If all slots are \emph{null}, the procedure is restarted.

Give an example execution showing that the linearization point for \texttt{enq()} cannot occur at line 15. (\emph{hint: give an execution, where two \texttt{enq()} calls are not linearized in the order they execute line 15})

Give another example execution showing that the linearization point for \texttt{enq()} cannot occur at line 16.

Since these are the only two memory accesses in \texttt{enq()}, we must conclude that \texttt{enq()} has no single linearization point.
Does this mean \texttt{enq()} is not linearizable?


\solution We consider two examples.

To see why the linearization point cannot occur at line 15, consider the following history where $enq(2) \to deq(2)$ with $enq(1)$ happening concurrently to $enq(2)$ and $deq(2)$, as is the situation in Figure \ref{fig:ex2-1-6-a}.
Line numbers have been annotated to highlight how the code might be executed.

If $enq(1)$ linearizes before $enq(2)$, then a dequeue operation must return 1, but event $deq(2)$ is valid because value 1 that might not yet be set can be skipped to read value 2.

\begin{figure}[H]
	\centering \includegraphics[width=14cm]{res/ex2-1-6-a.png}
	\caption{First counterexample} \label{fig:ex2-1-6-a}
\end{figure}

To see why the linearization point cannot occur at line 16, consider the same history, but with $deq(2)$ strictly after both enqueue operations.
This situation is displayed in Figure \ref{fig:ex2-1-6-b}.

Event $enq(2)$ might execute line 15 after event $enq(1)$ does, but line 16 before event $enq(1)$.
Event $enq(1)$ would have to linearize after $enq(2)$, so the FIFO queue must first return 2 and then 1.
But because line 15 is executed first for event $enq(1)$, the order of elements would have to be 1 and then 2.
Because the dequeue operation happens strictly after both enqueue operations, the dequeue operation cannot skip the value 1.

\begin{figure}[H]
	\centering \includegraphics[width=14cm]{res/ex2-1-6-b.png}
	\caption{Second counterexample} \label{fig:ex2-1-6-b}
\end{figure}

Oddly enough, these results do not mean that \texttt{enq()} is not linearizable.
Both examples are linearizable if we remove the restriction that a linearization point has to be placed at a specific line.
In general, it cannot be said that if there is no single line on which a method linearizes, the method is not linearizable.

%\emph{[2 bonus points] Come up with a single execution consisting of at most two \texttt{enq()} and one \texttt{deq()} operation, where the linearization points of the two \texttt{enq()} calls are different (not on the same code line).
%Provide a proof sketch why the linearization points of these two \texttt{enq()} calls cannot occur on the same line in your example.}

\section{Consensus} \label{sec:consensus}
\subsection{(6 points)} \label{ex:52}
Show that with sufficiently many $n$-thread binary consensus objects and atomic registers one can implement $n$-thread consensus over $n$ values.

\solution We construct an algorithm to solve $n$-thread consensus over $n$ values.

Initially we allocate two arrays of length $n$, \texttt{proposed} and \texttt{participating}.

Every thread starts by writing its proposed value into the \texttt{proposed} array and then \texttt{true} to the \texttt{participating} array.

Now we can start a series of $log_2(n)$ rounds, using a fresh $n$-thread binary consensus object in each round.
Every thread considers the binary representation of its proposed value and proposes the $i$-th bit on round $i$.
If the consensus value matches the bit of the proposed value, the thread continues this way.
If at any round a different value is reached by way of consensus, then the thread scans array \texttt{proposed}, filters out any threads who are not participating by reading \texttt{participating} and tries to find a value whose binary representation matches the series of bits reached by consensus in each round so far.
Once it founds such a value, it continues proposing this value instead.

The end result must be that each thread assumes the same final value, because each thread acquires the resulting bits of the binary consensus object from each round.
The algorithm is wait-free, because the algorithm terminates in a finite number of steps.

It remains to show that the final value is a value that was proposed by a thread.
There are two cases, either the thread wins consensus in every round or it does not.

If it does, then it will have managed to reach its own proposed value as the final value, making it a valid value.

If it does not, then it will have looked into the \texttt{proposed} array to find a matching value.
There must have been such a matching value, because for every round $i$ there must have been a thread that proposed the $i$-th bit on round $i$, otherwise that bit would not have won binary consensus on round $i$.
So there must always be a matched value.
This matched value must also have been proposed by another thread, because only proposed values by threads who have written array \texttt{participating} are considered.

Thus, this algorithm solves $n$-thread consensus over $n$ values.

There is an interesting intuition behind why this works for those who are familiar with bus arbitration protocols (as is used for arbitration of the CAN bus protocol).
In the bus protocol, every participant who is trying to place a value on the bus attempts to place its value by first trying to place its identifier on the bus.
The bus is configured in a way that makes a one of the two logical values 0 and 1 dominant and the other recessive and the identifiers are ordered such that those devices with a higher priority have dominant logical values and win arbitration.
Our case is almost identical, although instead of an identifier the arbitration happens for a proposed value.

\subsection{(6 points)} \label{ex:53}
The \texttt{Stack} class provides two methods: \texttt{push(x)} pushes a value onto the top of the stack and \texttt{pop()} removes and returns the most recently pushed value.
Prove that the \texttt{Stack} class has consensus number \emph{exactly(!)} two.

\solution This can be proven analogously to the queue example.

We first consider why the consensus number must be at least two and we will then consider why the consensus number cannot be higher than two.

If we initialize the stack as follows, $stack.push(LOSE) \to stack.push(WIN)$, we can solve consensus for at least two threads.
Each thread can $stack.pop()$, if the value returned is $WIN$, it uses its own value in the proposed array, \texttt{proposed[ThreadID]}.
If the value returned is $LOSE$, it uses the \emph{other} value in the proposed array instead, \texttt{proposed[1 - ThreadID]}.
Both threads decide on the same proposed value in a finite number of steps.

Now we consider why the stack's consensus number can be at most two.
Consider three threads A, B and C.

We know that there must exist a critical state where the move of a thread decides state of the consensus protocol.
Without loss of generality, we assume that thread A's next move brings the protocol into a 0-valent state, while thread B's next move brings the protocol into a 1-valent state.
Their next moves must be one of the two possible operations on the same stack.
Now we consider all possible cases.

If threads A and B are about to pop, then two elements will be removed from the stack.
Either A moved first, bringing the protocol into a 0-valent state, or B moved first, bringing the protocol into a 1-valent state.
Because the end result looks the same to C, no matter what it attempts to do, C cannot decide whether 0 or 1 is correct.

If threads A and B are about to push, then two elements will be pushed onto the stack.
Their push operations cannot determine the protocol state, because A and B won't have any new information to base their decision on.
It follows that threads A and B must pop at some point.
We can now bring about the same situation as before and C cannot decide whether 0 or 1 is correct.

The last two cases are those where one thread pushes a value and the other thread pops a value.
By a similar argument, if A and B move before C and one of them pushes before the other pops, the stack will end up in an empty state where it is once again indistinguishable for C whether 0 or 1 is correct.

Thus, a stack with these two operations can have a consensus number that is at most two.

Combining our two results, we learn that a stack with these two operations has consensus number exactly two.

\subsection{(2 points)} \label{ex:54}
Suppose we augment the FIFO \texttt{Queue} class with a \texttt{peek()} method that returns but does not remove the first element in the queue.
Show that the augmented queue has infinite consensus number.

\solution To show that this extended FIFO queue has an infinite consensus number, we construct an algorithm that solves $\infty$-thread consensus.

Let each thread queue its proposed value and then use \texttt{peek()}.
Each thread uses the value that is returned by \texttt{peek()}.

To prove that this algorithm solves $\infty$-thread consensus, we distinguish between two cases.
Either the thread was first to queue a value or it was not.

If it was first, then the thread will peek its own value and return.

If it was second, then the thread will not peek its own value, but the value of another thread who must have come before.

Because every thread always peeks the first value, every thread reaches the same value.
The queue and peek operations return in a finite number of steps and they do not depend on any other thread, so this algorithm is wait-free.
The value that is returned by peek must have been a value proposed by someone, making it valid.
Because every thread peeks only after queueing, it cannot be that the queue is empty.

Thus this extended FIFO queue has consensus number $\infty$.

\subsection{(2 points)} \label{ex:55}
Consider three threads, $A$, $B$, $C$, each of which has a MRSW register, $X_A$, $X_B$, $X_C$, that it alone can write and the others can read.

In addition each pair shares a \texttt{RMWRegister} register that provides only a \texttt{compareAndSet()} method: $A$ and $B$ share $R_{AB}$, $B$ and $C$ share $R_{BC}$, $A$ and $C$ share $R_{AC}$.
Only the threads that share a register can call that register's \texttt{compareAndSet()} method -- which is the only way to interact with said register.

Your mission: either give a consensus protocol and prove that it works or sketch an impossibility proof.

\solution There cannot exist a consensus protocol for three threads with the given primitives and resources.

To prove this, we consider what operations are available and why none of them can solve consensus.

We already know that the MRSW registers cannot be used to solve 3-thread consensus, because they have consensus number 1.

Thus the threads have to use the shared RMW registers to solve consensus.
Suppose without loss of generality that A must make a critical move before B and B before C.
The only operation available is CAS, so the RMW registers can only be used to determine if the CAS operation succeeded or not (has returned $true$ or $false$).
The only way in which CAS can succeed is if the threads know which value to expect.
This value cannot be random, because then there would only be a small probability of CAS succeeding for any thread.
We assume that this value is initialized beforehand in a value that is known and common to all threads, like for example $-1$.

If a thread tries to expect any other value, CAS can never succeed and the thread won't learn any new information.
Thus every thread tries to expect $-1$.
Because the value each thread tries to set has to be distinct, we can assume that they attempt to write their thread IDs, because that is their only differentiating property.

The CAS operation returning true informs a thread of its success, which means that it was first to attempt CAS on the shared register.
Both A and B succeeding before C therefore means that $R_{AC}$ will equal A and $R_{BC}$ will equal B.
A and B know that they succeeded against C, and they can use the remaining shared register to determine that A was first, to decide on the proposed value of A.

But there is no way for C to determine what value to use.
If B were to succeed before A instead, the situation would look identical to C, so C cannot in general decide on the same value as A and B.
Thread C could try to see if any of the MRSW registers was updated, but that would violate the wait-free property that is required of our consensus protocol.

For any other order where one thread succeeds twice on its CAS operation, the end result is the same.
There is another case in which no thread succeeds twice on its CAS operation, but this implies that each thread succeeds on only one shared RMW register.
This case is even more hopeless, because there is no way to determine which proposed value to use as there is no distinguishing feature between threads.

\subsection{(6 points)} \label{ex:56}
Consider the situation described in \ref{ex:55} except that $A$, $B$, $C$ can apply a \texttt{doubleCompareAndSet()} operation to both of their shared registers at once.
Regarding the semantics and signature of this new operation consider two variations \footnote{If any of the \texttt{doubleCompareAndSet()} variations is called by a thread for at least one register, which it does not have access to, we assume the method throws an \texttt{IllegalAccess} exception with no further side effects.}:
\begin{enumerate} [label=\alph*)]
	\item Suppose the signature is

		\texttt{\textbf{bool} doubleCompareAndSet(RMWRegister shReg1, RMWRegister shReg2, \textbf{int} expect, \textbf{int} update)}.\\ \label{it:ex56_a}
		For a call of \texttt{doubleCompareAndSet(X, Y, exp, up)}, atomically the following is done: it checks if the value in \texttt{X} and \texttt{Y} is equal to \texttt{exp}.
		If both are equal to \texttt{exp}, \texttt{X} and \texttt{Y} are updated with \texttt{up} and \texttt{true} is returned.
		Otherwise no update is conducted and it returns \texttt{false}.
		The equivalent description in code can be seen in Listing \ref{list:DCAS_1}.
	\item Suppose the signature is

		\texttt{TwoBool doubleCompareAndSet(RMWRegister shReg1, RMWRegister shReg2, \textbf{int} expect, \textbf{int} update)},\\ \label{it:ex56_b}
		where \texttt{TwoBool} is a simple class consisting of two (publicly accessible) \texttt{\textbf{bool}} values.

		\texttt{\textbf{class} TwoBool \{ \textbf{public bool} val1; \textbf{public bool} val2; \}} \\
		Consider the semantics given by Listing \ref{list:DCAS_2}.
\end{enumerate}

\begin{lstlisting} [numbers=left, numbersep=5pt, tabsize=4, label=list:DCAS_1, caption=doubleCompareAndSet() - variation 1]
bool doubleCompareAndSet(RMWRegister shReg1, RMWRegister shReg2, int expect, int update)
{
	bool retVal = false;
	atomic
	{
		if(shReg1 == expect && shReg2 == expect)
		{
			shReg1 = update;
			shReg2 = update;
			retVal = true;
		}
	}
	return retVal;
}
\end{lstlisting}
\begin{lstlisting} [numbers=left, numbersep=5pt, tabsize=4, label=list:DCAS_2, caption=doubleCompareAndSet() - variation 2]
TwoBool doubleCompareAndSet(RMWRegister shReg1, RMWRegister shReg2, int expect, int update)
{
	TwoBool retVal;
	atomic
	{
		retVal.val1 = shReg1.compareAndSet(expect, update);	// the 'usual' CAS operation on a
		retVal.val2 = shReg2.compareAndSet(expect, update);	// single register
	}
	return retVal;
}
\end{lstlisting}

For both semantics of the \texttt{doubleCompareAndSet()} operation either provide a consensus protocol and prove that it works or sketch an impossibility proof.
(Note that threads can still use their shared register's normal \texttt{compareAndSet()} method!)

\solution \textbf{(a)} With this new double-CAS operation, we can solve 3-thread consensus.

For the consensus object, let the shared RMW registers be initialized to a predetermined value, e.g. $R_{AB} = R_{BC} = R_{AC} = -1$.

Let every thread try to double-CAS on its shared RMW registers after writing its proposed value to its MRSW register.
If it succeeds, it uses its own proposed value.
If it does not, then it uses CAS on its shared RMW registers, expecting the default value $-1$.
The success of this CAS operation will tell the thread if the other thread who shares this RMW register has succeeded with its double-CAS (a CAS returning $false$ implies that some thread changed the value, by definition the partner who shares this RMW registers).
After determining which thread succeeded with its double-CAS, it reads the appropriate MRSW register to determine the consensus value.

This algorithm solves 3-thread consensus.
Every thread acquires a value that was proposed, because it either uses its own proposed value or the value of a thread proposed before it.
The first thread to succeed with double-CAS on its shared registers will determine the proposed value and because of the atomicity of double-CAS, any other thread will be prevented from succeeding with its own double-CAS operation.
Every thread will determine correctly which thread succeeded with double-CAS and use its value, which must have already been written before it executed double-CAS.
The sequence of operations finishes in a finite number of steps and no thread depends on the result of another thread, so this protocol is wait-free.

\solution \textbf{(b)} With this new double-CAS operation, we cannot solve 3-thread consensus.

The algorithm provided in (a) cannot be used because for this double-CAS operation, it is not guaranteed that a thread will either succeed on both shared registers or fail on both registers.

As we have seen in \ref{ex:55}, to solve 3-thread consensus the operation of each thread cannot be a read/write to a MRSW register or a CAS on one of its shared RMW registers.
It follows that every thread uses double-CAS.
By similar reasoning as before, all threads must expect a common value, we assume this value to be $-1$.
The first thread to use double-CAS will succeed on both of its shared registers.
The next thread to use double-CAS therefore has to succeed on one shared register but fail on the other.

This partial success of the double-CAS operation removes an important piece of information by removing the initial value $-1$ from the last shared register.
The thread that comes last is in the same situation as it was in \ref{ex:55}, it cannot determine which thread used double-CAS first, because it can only use CAS on the shared registers and it won't find $-1$ in any of them.

It follows that this double-CAS operation cannot be used to solve 3-thread consensus.

\subsection{(2 points)} \label{ex:59}
The \texttt{SetAgree} class, like the \texttt{Consensus} class provides a \texttt{decide()} method whose call returns a value that was the input of some thread's \texttt{decide()} call.
However unlike the \texttt{Consensus} class, the values returned by \texttt{decide()} calls are not required to agree.
Instead these calls may return no more than $k$ distinct values.
(When $k$ is 1, \texttt{SetAgree} is the same as consensus.)

What is the consensus number of the \texttt{SetAgree} class when $k > 1$?
Sketch a proof!

\solution To see whether this could work or not, we can restrict ourselves to a simple case, namely $n$-thread binary consensus with $k = 2$.

The domain of this problem consists only of two values $true$ and $false$, so an $n$-thread binary consensus protocol can only decide on one of these two values.

Because $k = 2$, it does not matter at all what each thread decides on.
It follows that such a class cannot solve $n$-thread binary consensus for any $n > 1$.
Because \texttt{SetAgree} cannot be used to solve $n$-thread binary consensus with $k = 2$, it cannot be used to solve consensus with any $k > 2$.
The consensus of such a class is 1 (that is to say, it cannot solve consensus).

\end{document}
